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Abstract — Low-complexity preceding algorithms are proposed 
in this work to reduce the computational complexity and improve 
the performance of regularized block diagonalization (RBD) 
based preceding schemes for large multi-user MIMO (MU- 
MIMO) systems. The proposed algorithms are based on a channel 
inversion technique, QR decompositions , and lattice reductions 
to decouple the MU-MIMO channel into equivalent SU-MIMO 
channels. Simulation results show that the proposed preceding 
algorithms can achieve almost the same sum-rate performance 
as RBD preceding, substantial bit error rate (BER) performance 
gains , and a simplified receiver structure, while requiring a lower 
complexity. 

I. Introduction 

Block diagonalization (BD) based precoding techniques 
m, |[3l are well-known precoding strateg ies for multi- 
user multiple-input multiple-output (MU-MIMO) systems. By 
implementing two SVD operations, BD precoding can elim- 
inate the multi-user interference (MUI). Since BD precoding 
focuses on canceling the MUI, it suffers a performance loss 
at low signal to noise ratios (SNRs) when the noise is the 
dominant factor. By relaxing the zero MUI constraint, the reg- 
ularized block diagonalization (RBD) precoding scheme has 
been proposed in 14|. Instead of achieving strictly independent 
parallel channels between the users as BD precoding, RBD 
precoding allows a small level of interference between the 
users. Although a better performance is obtained by the RBD 
precoding, it still needs two SVD operations as BD precoding. 
As revealed in this paper, the computational complexity of the 
RBD precoding algorithm depends on the number of users 
and the dimensions of each user's channel matrix which could 
result in a considerable computational cost for large MIMO 
systems. The high cost of the two SVD operations required by 
the RBD precoding suggests that precoding algorithms with 
lower complexity should be investigated for use in very large 
MIMO systems. 

In order to reduce the computational complexity, a gener- 
alized MMSE channel inversion (GMI) precoding algorithm 
has been proposed in 13] to implement the RBD precoding 
scheme. The first SVD operation of the RBD precoding is 
implemented by a matrix inversion method in GMI precoding. 
In 161, the first SVD operation of the RBD precoding is 
replaced with a less complex QR decomposition, and we term 
it as QR/SVD RBD precoding. For the second SVD operation, 
however, both the GMI and the QR/SVD RBD precoding 
algorithms require the same number of operations as the 



original RBD precoding scheme. If the second SVD operation 
is implemented at the transmit side, then the corresponding 
unitary decoding matrix needs to be known by each distributed 
receiver, which requires an extra control overhead [0. In 
this work, we develop a simplified GMI (S-GMI) method 
to obtain the first precoding filters. In order to reduce the 
complexity further and to obtain a better BER performance, we 
transform the equivalent SU-MIMO channels into the lattice 
space after the first precoding process by utilizing the lattice 
reduction (LR) technique fSl whose complexity is mainly due 
to a QR decomposition. Then, a linear precoding algorithm is 
employed instead of the second SVD operation to parallelize 
each user's streams. 

The essential premise of using transmit processing tech- 
niques is the knowledge of the channel state information 
(CSI) at the transmitter IT] - Q. In time-division duplexing 
(TDD) systems, CSI can be obtained at the BS by exploit- 
ing reciprocity between the forward and reverse links. In 
frequency-division duplexing (FDD) systems, reciprocity is 
usually not available, but the BS can obtain knowledge of 
the downlink user channels by allowing the users to send a 
small number of feedback bits on the uplink ||9], ifTOl . We 
assume that full CSI is available at the transmit side since 
limited feedback technique s are not the main focus of this 
work. In this context, it is worth noting that the two SVD 
operations and the decoding matrix at each receiver are no 
longer required. The computational complexity is reduced and 
the receiver structure can be simplified. A significant amount 
of power can be saved which is very important considering the 
mobility of distributed users. For convenience, the proposed 
precoding algorithm is abbreviated as LR-S-GMI. According 
to the specific precoding constraint, the proposed LR-S-GMI 
precoding algorithms are categorized as LR-S-GMI-ZF and 
LR-S-GMI-MMSE precoding, respectively. We compare the 
proposed LR-S-GMI technique to the precoding algorithms 
reported in the literature including the BD, RBD, QR/SVD 
RBD and GMI precoding algorithms. 

This paper is organized as follows. The system model is 
given in Section II. A brief review of the RBD precoding al- 
gorithms is presented in Section III. The proposed LR-S-GMI 
precoding algorithms are described in detail in Section IV. 
Simulation results and conclusions are presented in Section V 
and Section VI, respectively. 



II. System Model 

We consider an uncoded MU-MIMO downlink channel, 
with Nt transmit antennas at the base station (BS) and Ni 
receive antennas at the ith user equipment (UE). With K 
users in the system, the total number of receive antennas 
is Nji = X]i=i ^i- ^ block diagram of such a system is 
illustrated in Fig. 1. From the system model, the combined 
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Fig. 1. MU-MIMO System Model 

channel matrix H and the combined precoding matrix P of 
all users are given by 

H^[Hl Hi ... Hl]^ e C^«x^-, (1) 

P=[PxP2 ... Pk] e C^^^^«, (2) 

where Hi E (J^^i^Nt jg jj^g ^jjj usefs channel matrix. The 
quantity Pi E c^^txAT; ^^ ^^^ _^^^ user's precoding matrix. We 
assume a flat fading MIMO channel and the received signal 
t/j G C^' at the ith user is given by 



K 



V, = H,x^ + H 



3 = ^j¥=i 



(3) 



where the quantity Xi E C'^' is the ith user's transmit 
signal, and n^ S C^' is the ith user's Gaussian noise with 
independent and identically distributed (i.i.d.) entries of zero 
mean and variance af^. Assuming that the average transmit 
power for user i is Eg- , we construct a normalized signal Xi 
such that 



(4) 



where S; = Pidi with di being the data vector , ji — 
\\si\\2/Eg.. With this normalization, the transmit signal Xi 
obeysE||a;,||2 = £;,^ HH. 

The received signal y^ is weighted by the scalar ^/j. to 
form the estimate 



d, = Vl.Vi, 



(5) 



where the physical meaning of the scalar .y/7 is to make sure 
that the average transmit power Eg- is still the same after the 
precoding process. Note that it is necessary to cancel y/j. out 
at the receiver to get the correct amplitude of the desired signal 
part. 



III. Review of RBD Precoding Algorithm 

The design of the RBD precoding algorithm is performed 
in two steps H. The first precoding filter is used to balance 
the MUI with noise, then approximate parallel SU-MIMO 
channels are obtained. The second precoding filter is imple- 
mented to parallelize each user's streams. Correspondingly, the 
precoding matrix P can be rewritten as 



P = P^'P^ 



(6) 



where 



[PI 
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and 



diag{Pj, P2, . . . , P;^}. We exclude the ith user's channel 
matrix and define Hi as 



H, 



M 



Hi I -ffj+i 



H],Y e 



■yNiXNr 



(7) 



where Ni = Nn — Ni. Thus, the interference generated to the 
other users is determined by HiPl. In order to balance the 
MUI and the noise term, an RBD constraint is developed in 
dU and given by 



K 



P," = mm E{^ II i?iPf 

• i=l 



a||2 



-7lln|||} 



s.t. E||a;||p 



Es. (8) 

Assuming that the rank of Hi is Li, define the SVD of Hi 

(9) 



H. = u;^.^^f = umvTvfY, 



where Ui G (^NixNi ^^^ y,- £ (^NtxNt ^j.g unitary matrices. 
The diagonal matrix S; e {^NixNt contains the singular 
values of the matrix Hi. As shown in |:4|, the solution for 
the RBD constraint can be obtained as 



ja(RBD) 



Nrct^ 



V^CE.S^ + alNr) 



-1/2 



(10) 



where a = "'^" is the regularization parameter 

After the first RBD precoding process, the MU-MIMO 
channel is decoupled into a set of K approximately parallel 
SU-MIMO channels. Due to the regularization process, there 
are small residual interferences between these channels, and 
these interferences tend to zero at high SNRs. Thus, the 
effective channel matrix for the ith user can be expressed as 

Jfeff. = -H-,P^ (11) 

Define L^s = rank(i?offi) and consider the second SVD 
operation on the effective channel matrix 

H 



Hr 



U,-E,V, 



(12) 



using the unitary matrices Ui E £;iottxLoff ^jj^j y/^ ^ 
i^NtxNt^ The second precoding filters for RBD precoding 
can be obtained as 



^b(R.BD) ^ ^^(RBD) 



(13) 



where A is the power loading matrix that depends on the 
optimization criterion. An example power loading is the water 
filling (WF) ||T2l|. The ith user's decoding matrix is obtained 
as 



G, = U 



H 



(14) 



which needs to be known by each user's receiver. 

Note that for the conventional RBD precoding algorithm, 
there is a dimensionality constraint to be satisfied 

Nt > max{rank(!ff i), rank(!H'2), ■ ■ • , rank(HK)}- (15) 

Then, we can get the matrix dimension relationship as Lett < 
Ni < Wi < Nr < Nt- Note that the first SVD operation in 
(9) needs to be implemented K times on Hi with dimension 
Ni X Nt and the second SVD operation in (12) needs to be 
implemented K times on iifoffi with dimension Lcb x Nt- 
From the above analysis, most of the computational com- 
plexity of the RBD precoding algorithm comes from the two 
SVD operations which make the computational complexity of 
the RBD precoding algorithm increase with the number of 
users K and the system dimensions. In order to reduce the 
computational complexity of the RBD precoding algorithm, 
low complexity precoding algorithms for MU-MIMO systems 
are proposed in what follows. 

IV. Proposed Low Complexity LR-S-GMI Precoding 
Algorithms 

In this section, we describe the proposed low-complexity 
LR-S-GMI precoding algorithms based on a strategy that em- 
ploys a channel inversion method 15], QR decompositions, and 
lattice reductions. Similar to the RBD precoding algorithm, 
the design of the proposed LR-S-GMI precoding algorithms 
is computed in two steps. 

First, we obtain P° in the conventional RBD precoding 
algorithm for the LR-S-GMI precoding algorithms by using 
one channel inversion and K QR decompositions. By applying 
the MMSE channel inversion, we have 



msc 



{H"H + aI)-^H" 

[Hi,ynsci ■H'2,msc, - ■ - , H 



(16) 



K,j 



Considering a high SNR case, it can be shown that HH\^^^^ « 
Int El- This means that the off-diagonal block matrices of 
HH]^^^ converge to zero as the SNR increases. Then, we 
obtain a condition which shows that Hi^^^sc is in the null 
space of Hi 



(17) 



Considering the QR decomposition of Hi , 



HiHin 



-'^iQi.msc-'^i 



2,11180-'''*, msc 



Oioi i^l,.-.,K, (18) 



where .Ri^nisc S C^'^^' is an upper triangular matrix and 
Qj msc •= c^rxJVi jg ^jj orthogonal matrix. Since i?i,msc is 



'■i,IllSC 

invertible, we have 



H,Q, 



0. 



(19) 



Thus, Qi jjjgg satisfies the RBD constraint (8) to balance the 
MUI and the noise. 

We have simplified the design of P° for the RBD precoding 
here as compared to [5] where a residual interference suppres- 
sion filter Ti is applied after P". The filter Ti increases the 
complexity and cannot completely cancel the MUI. Therefore, 



we omit the residual interference suppression part since it is 
not necessary for the RBD precoding. We term the simpli- 
fied GMI as S-GMI precoding in this work. Then, the first 
precoding matrix can be obtained as 



% ^& i , msc ' 

and the first combined precoding matrix is 

P^^[Pl PI, ..., P^]. 



(20) 



(21) 



Next, we employ the LR-aided linear precoding algorithm 
instead of the second SVD operation to obtain Pj and paral- 
lelize each user's streams. The aim of the LR transformation is 
to find a new basis H which is nearly orthogonal compared to 
the original matrix H for a given lattice L{H)- The most com- 
monly used LR algorithm has been first proposed by Lenstra, 
Lenstra and L. Lovasz (LLL) in lfT4l with polynomial time 
complexity. In order to reduce the computational complexity, 
a complex LLL (CLLL) algorithm was proposed in [[8], which 
reduces the overall complexity of the LLL algorithm by 
nearly half without sacrificing any performance. We employ 
the CLLL algorithm to implement the LR transformation in 
this work. 

After the first precoding, we transform the MU-MIMO 
channel into approximate parallel SU-MIMO channels and the 
effective channel matrix for the ith user is 



ffeff. = -ff.-P- 



(22) 



We perform the LR transformation on -ffg^ in the precoding 
scenario ifTSI, that is 



Jfn 



T, if off- and H, 



cffi 



T7^H, 



cffi 



(23) 



where Ti is a unimodular matrix (det|Ti| = 1) and all 
elements of Ti are complex integers, i.e. </ ^ <S Z + jZ. 

Following the LR transformation, we employ the linear 
precoding constraint to get the second precoding matrix in- 
stead of the second SVD operation in (12). The ZF precoding 
constraint is implemented for user i as 



^fc 



_-_ff 



-^H 



-PzF, - -f^cff.(-f^off.-ffoffJ 



(24) 



It is well-known that the performance of MMSE precoding 
is always better than that of ZF precoding. We can get the 
second precoding filter by employing an MMSE precoding 
constraint. The MMSE precoding is actually equivalent to ZF 
precoding with respect to an extended system model ifTSl . ifTTJI . 
The extended channel matrix H^ for the MMSE precoding 
scheme is defined as 



H=[H,^Ij,^] 



(25) 



By introducing the regularization factor a, a trade-off between 
the level of MUI and noise is introduced ifTsl . Then, the 
MMSE precoding filter is obtained as 



MMSE 



= ah"{hh")-\ 



(26) 



where A = \l Nti ^NT,Nn\, and the multiplication of A will 
not result in transmit power amplification since A A = JjVt ■ 
From the mathematical expression in (26), the rows of H^ 



determine the effective transmit power amplification of the 
MMSE precoding. Correspondingly, the LR transformation for 
MMSE precoding should be applied to the transpose of the 
extended channel matrix H^g. = [HcSi,\/ctlNi] for the 
MMSE precoding, and the LR transformed channel matrix 
H„fr is obtained as 



TABLE I 
The Proposed LR-S-GMI-MMSE Precoding Algorithm 



Then, the LR-aided MMSE precoding filter is given by 



(27) 



MMSE; 



^iILcsAILcff,ILoSi 



-1 



(28) 



Finally, the second precoding matrix P for all users is 



~b , ~b ~b 

P =diag{Pi,P2,. 



,Pk}- 

^b 



(29) 



The resulting precoding matrix is P = P°P . Since the lattice 

~ b 

reduced precoding matrix P has near orthogonal columns, 
the required transmit power will be reduced compared to the 
BD or RBD precoding algorithms. Thus, a better BER perfor- 
mance than the RBD precoding algorithm can be achieved by 
the proposed LR-S-GMI precoding algorithms. 
The received signal is finally obtained as 



y = HPd + ^n. 



(30) 



The main processing work left for the receiver is to quantize 
the received signal y to the nearest data vector and the 
decoding matrix G in (14) is not needed anymore. 

The proposed precoding algorithms are called LR-S-GMI- 
ZF and LR-S-GMI-MMSE depending on the choice of the 
second precoding filter as given in (24) and (28), respectively. 
We will focus on the LR-S-GMI-MMSE since a better perfor- 
mance is achieved. The implementing steps of the LR-S-GMI- 
MMSE precoding algorithm are summarized in Table 1. By 
replacing the steps 8 and 9 in Table I with the formulation in 
(24), the LR-S-GMI-ZF precoding algorithm can be obtained. 

V. Simulation Results 

A system with Nt = 8 transmit antennas and K ~ A users 
each equipped with Ni = 2 receive antennas is considered; 
this scenario is denoted as the (2, 2, 2, 2) x 8 case. The vector 
di of the ith user represents data transmitted with QPSK 
modulation. 

The channel matrix Hi of the ith user is modeled as a 
complex Gaussian channel matrix with zero mean and unit 
variance. We assume an uncorrelated block fading channel. 
We also assume that the channel estimation is perfect at the 
receive side and the feedback channel is error free. The number 
of simulation trials is 10^ and the packet length is 10^ symbols. 
The Eb/No is defined as Eb/No = i^f^ with M being the 
number of transmitted information bits per channel symbol. 

Fig. 2. shows the BER performance of the proposed and 
existing precoding algorithms. The QR/SVD RBD and GMI 
precoding algorithms achieve almost the same BER perfor- 
mance as the conventional RBD precoding. It is clear that the 
S-GMI precoding has a better BER performance compared 
to BD, RBD, QR/SVD RBD and GMI precoding algorithms. 
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Applying the MMSE Channel Inversion 


(1) 

(2) 


iori=l : K 


(3) 


[^Uee nlj = Q^(^U=- °) 


(4) 
(5) 
(6) 




(V) 
(8) 

(9) 
(10) 

(11) 
(12) 


[Zf «:Sf.] = CLLL(H:;fff^) 
A^ = [U 0,] 

end 

Compute the overall precoding matrix 

P'=di.g{Plpl...,PU 


(13) 


P = P'^P*' 

Calculate the scaling factor 7 


(14) 


^=(\\Pd\\l/Es) 
Get the received signal 


(15) 


y = HPd + ^n 

Transform back from lattice space 


(16) 


d = T\y\ 



The proposed LR-S-GMI-MMSE precoding algorithm shows 
the best BER performance. At the BER of 10"^, the LR- 
S-GMI-MMSE precoding has a gain of more than 5.5 dB 
compared to the RBD precoding. It is worth noting that the 
BER performance of the RBD precoding is outperformed 
by the proposed LR-S-GMI-MMSE precoding in the whole 
Eb/NQ range and the BER gains become more significant 
with the increase of Eb/No. The reason why the proposed 
LR-S-GMI-MMSE precoding algorithm provides a better BER 
performance than the exiting algorithm is because it provides 
a better channel quality as measured by the condition number 
of the effective channel. 

Fig. 3. illustrates the sum-rate of the above precoding 
algorithms. The information rate is calculated using ifTSi : 

C^\og{det{I + a-^HPP"H")) (bits/Hz). (31) 

BD precoding with WF power loading shows a better sum- 
rate performance than BD precoding without power load- 
ing. However, as shown in Fig. 2., the BER performance 
is degraded by applying this WF scheme. Similar to the 
BER figure, the RBD, QR/SVD RBD and GMI precoding 
algorithms show a comparable sum-rate performance. The S- 
GMI precoding also achieves the sum-rate performance of the 
RBD precoding. The proposed LR-S-GMI-MMSE precoding 
algorithm shows almost the same sum-rate performance as the 
RBD precoding at low Eb/Nos. At high Eb/Nos, however, 
the sum-rate performance of LR-S-GMI-MMSE precoding is 
slightly inferior to that of the RBD precoding and approach 
es the performance of BD precoding. 

The required floating point operations (FLOPs) for the con- 
ventional BD, RBD and QR/SVD RBD precoding algorithms 
are given in ||T9l , 1201 . The reduction in the number of FLOPs 
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Fig. 2. BER performance, (2, 2, 2, 2) X 8 MU-MIMO 
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Fig. 3. Sum-rate perfomiance, (2, 2, 2, 2) X 8 MU-MIMO 



obtained by the proposed LR-S-GMI-MMSE is 73.6%, 69.5% 
and 59.1% as compared to the RBD, BD and QR/SVD RBD 
precoding algorithms, respectively. 

VI. CONCLUSION 

In this paper, low-complexity precoding algorithms based 
on a channel inversion technique, QR decompositions , and 
lattice reductions have been proposed for MU-MIMO systems. 
The complexity of the precoding process is reduced and a 
considerable BER gain is achieved by the proposed LR-S-GMl 
precoding algorithms at a cost of a slight sum-rate loss at high 
SNRs. Since the proposed LR-S-GMl precoding algorithms 
are only implemented at the transmit side, the decoding matrix 
is not needed anymore at the receive side compared to the RBD 
precoding algorithm. Then, the structure of the receiver can be 
simplified, which is an additional benefit of the proposed LR- 
S-GMl precoding algorithms. 
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